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VARIATION OF WORD FREQUENCIES IN RUSSIAN 
LITERARY TEXTS 

VLADISLAV KARGIN 


Abstract 

We study the variation of word frequencies in Russian 
literary texts. Our findings indicate that the standard 
deviation of a word’s frequency across texts depends on 
its average frequency according to a power law with expo¬ 
nent 0.62, showing that the rarer words have a relatively 
larger degree of frequency volatility (i.e., “burstiness”). 

Several latent factors models have been estimated to 
investigate the structure of the word frequency distribu¬ 
tion. The dependence of a word’s frequency volatility on 
its average frequency can be explained by the asymmetry 
in the distribution of latent factors. 


1. Introduction 


The study of word frequency variation in different texts arose first 
in the problem of author attribution ( Zipf 1932, Yule 1944, Mosteller 
and Wallace 1964). Recently, the explosive growth in the computing 
power and in the text data volume led to many new applications. For 
example, the text indexing problem asks to associate documents with 
queries for fast retrieval; the authorship profiling problem require to 
describe features of the author (sex, age, religious and political beliefs, 
etc) based on texts that the author produced. In addition, the classic 
authorship attribution problem found new applications in security and 
forensics (see surveys by Holmes 1998, Juola 2008, Koppel, Schler, and 
Argamon 2009 and Stamatatos |2009 ). 

For all these applications, the fundamental statistical issue is the 
distribution of word frequencie^] in different texts. For example, if a 
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1 In this paper we use the term “frequency” as usual in statistics, that is, the 
number of the word occurrences in a document divided by the document’s total 
number of words. 
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word in a query has its frequency in a document higher than its average 
frequency, then this document can be regarded as more relevant to the 
query. 

Some properties of the word frequency distribution were noticed 
a long time ago. For example, Zipf’s law (Zipf 1932) describes the 


distribution of word frequencies in a particular text, and Heaps’ law 
(p. 207 in Heaps 1978, p.75 in Herdan 1966) relates the number of 
distinct words in a text to its length. Some new research on these 
laws was done in Font-Clos, Boleda, and Corral 2013, Gerlach and 


Altmann 2013, Gerlach and Altmann 2014, and Piantadosi 2014 See 


also surveys in Zanette 2014, Altmann and Gerlach 2015 This paper 


focuses on a different set of properties and investigates the variation of 
word frequencies across documents. 

One has to understand the structure of the word-document fre¬ 
quency matrix for applications in the information retrieval, in order 
to handle the problems of word synonymity and polysemy. For this 
purpose, there have been recently developed tools such as LSA (“latent 
semantic analysis”, Deerwester et al. 1990), pLSA 

1999b, 


semantic analysis”, Hofmann 
cation”, Blei, Ng, and Jordan 


“probabilistic latent 
(“latent Dirichlet allo- 


2003). 


and LDA 
The main idea of these methods 


is the dimension reduction. The variation of word frequencies across 
texts is assumed to stem mainly from the variation in relatively small 
amount of factors (or “topics”) across texts. 

The goal of this study is to establish basic facts about the fluctua¬ 
tions of word frequencies across documents such as the dependence of 
the fluctuation size on the average word frequency. In order to clar¬ 
ify this dependence, we will apply the latent factor techniques such as 
LSA, pLSA, and LDA. 

The data for this study come from a large online library of Russian 
literary texts. This collection is especially appropriate for our study 
since it covers a very large spectrum of texts from various authors, 
genres and epochs. 

The paper is organized as follows. First, in Section [2] we describe the 
data. Then, in Section [3] we study how the size of frequency fluctuations 
across texts depends on the word’s average frequency. Next, in Sections 
[4] and [5] we apply factor models to analyze the variation of vocabulary 
across texts in more detail. Finally, Section [6] concludes. 


2. A preliminary look at the data 

We use data from Flibusta, a Russian online library. It covers Rus¬ 
sian and translated fiction works from many historical periods and 
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literary genres. Currently, it has between 200, 000 and 300, 000 texts 
by about 85, 000 authors, where the author is understood to include 
translators and sometimes organizations that published a particular 
text. Our analysis uses only a part of this dataset (around 25, 000 
books). In particular, we use only books which are available in a text 
format (more precisely, in the “FB2” book format) and we exclude the 
documents that are available only as pdf, djvu, doc, and other binary 
formats. 

The library works using the wiki principle and the texts are uploaded 
by users, therefore the number of texts depends both on how many texts 
were written by the author and on how many of them were uploaded 
by users. Table [8] in Appendix shows authors with the largest number 
of texts. 

The top place belongs to “Unknown Author”, which can be associ¬ 
ated with texts such a “Bhagavad Gita” or “Poetry of Medieval France”. 
In the second place one sees a weekly political publication "Tomorrow". 

The third and fourth places belong to the American and Russian 
science fiction writers Ray Bradbury and Kir Bulychev, respectively. 
Many of the other top authors are authors and translators of books 
in popular genres such as science fiction, mystery, romance, action, 
historical fiction, sensational and how-to literature. 

The right portion of Table [8] in Appendix shows the top 25 authors 
after we excluded the “unknown author”, weekly publications, transla¬ 
tors, and the authors working in the genres associated with popular 
culture. The result is the list of well-known authors, most of which are 
short story writers. For these authors, the number of texts in the online 
library ranges from 446 for Anton Chekhov to 144 for Franz Kafka. 


3. Variation of word frequencies across texts 


Suppose that is an indicator variable which equals 1 if the word 
at place t in book b equals w. Then, the frequency of word w in book 
b can be written as 


x b,w rp ^ 1 £b,w’ 


( 1 ) 


where T ]\ is the length of the book b. 

First, let us take the hypothesis that for a given w the variables 
are independent identically distributed random variables with the 
expectation parameter p w , which does not depend on b. Then E Xb, w = 
p Wl and 

p w (l-p w ) 

T h 


V ( x bjW ) 


( 2 ) 
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Figure 1. Normalized variance vs average frequency. 


We can use 
we estimate p w 


to check Qp w are i.i.d. variables. For this purpose, 
5y using the whole sample: 


( 2 ) 


B 


Pw ■ = 


T 


6=1 t =1 


(t) 

b,wi 


( 3 ) 


where T is the total number of words in the data and B is the number 
of texts, and then we compute the normalized variance of Xb )W across 
books. 


v - = i 


B 

E 


(VTA 




Pv 


( 4 ) 


V VpM-Pw) 

This statistic should be compared with 1. 

The results are shown in Figure [lj They suggest that this model is 
not acceptable and that there is a significant degree of variation in the 
distribution of £ b:W across texts. 

This variation in the word frequency distribution across texts is at 
the heart of most applications. However, its first systematic study is 
relatively recent and was done in Church and Gale 1995 The phe¬ 


nomenon is often called burstiness for a measure of word frequency 
variability which was used in Church and Gale0 One interesting ob¬ 
servation of Church and Gale is that the words with an unusually high 


2 The name “burstiness” comes from the observation that if a rare word has 
occurred at least once in a document, then it is likely to occur more times in 
the same document than it is predicted by a Poisson distribution with the word’s 
average frequency. This observation can be explained by the variability of the word 
frequencies across texts since the observation of a word in a document changes the 
posterior belief about the word frequency in this document. 
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frequency variability are often content words: they have an additional 
linguistic load. 

Now, let the variables be independent random variables, which 
are identically distributed conditional on b and w and have the expec¬ 
tation parameter p bjW . That is, the parameter is allowed to change 
from text to text and we are interested in learning how it is distributed 
across texts. 

The simplest estimate for p btW is x byW = pr Y^t=\ €bw- H i s reliable 
only if the standard deviation of the estimate is sufficiently small: 


Pb,w 


I Pb,w{_ 1 Pb,w ) 

T h : 


( 5 ) 


or p b>w > T b 4 . 

In our database, the average text length is of the order of 3 x 10 4 
words and therefore we can expect that x b , w reliably estimates p bjW only 
if Pb,w > 10 -4 . 

Let us define the average word frequency: 



6=1 


and the cross-text variance: 



6=1 


( 6 ) 

( 7 ) 


In the next pictures we order word types by their average frequency. 




Figure 2. The (estimated) expectation, second moment, and 
variance of the word frequency distribution. 
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The pictures in Figure [2] suggest that in general the variance decline 
together with the average frequency, soe it is natural to ask about the 
law of this dependence. 


1 

i 


o 

o 



average frequency x 


Figure 3. Normalized variance vs average frequency; 1000 of 
the most frequent words. 


In Figure [3j the vertical axis shows the variance normalized by a 
power of the average frequency: 


Vw 


o 


2 

w 


- 1 - 25 ' 
x w 


( 8 ) 


The exponent k = 1.25 was chosen to fit the data. We show the results 
for 1, 000 words with the largest frequency. These are the words for 
which we can expect that the variance <7^, is reliably estimated. Figure 
[3] demonstrates that the variance follows the power law: 


a 


ax 1 ' 25 . 


( 9 ) 


where a is a random variable which generally exceeds 4 x 10 3 . 

Or, in terms of the ratio of the standard deviation to the mean: 


= ~ a^x- 0 375 . ( 10 ) 

x 

That is, the ratio increases for rarer words. 

Figure [4] is similar except it also shows variances and average fre¬ 
quencies for some of the less-frequent words. (We simply use the first 
2000 different words that appeared in the data.) The conclusion drawn 
from Figure [3] is not changed by Figure [|J although we observe some 
deviations below the straight line for the normalized variance of less- 
frequent words. 
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average frequency x 


Figure 4. Normalized variance vs average frequency; a sam¬ 
ple of 2000 words. 


In summary, these observations show that there is a power depen¬ 
dence between the variance of document word frequencies and the av¬ 
erage frequency. The frequent words have larger variation in frequency 
across texts. However, the ratio of the standard deviation to the av¬ 
erage frequency is increases as the average frequency becomes smaller. 
This dependence follows a power law with an exponent of approxi¬ 
mately —0.375. 

This relation can be seen as a quantification of the burstiness phe¬ 
nomenon. In particular, it shows that burstiness is in general more 
pronounced for rarer words. Hence, if volatility of a word’s frequency 
(i.e., its burstiness) is used to evaluate the amount of content associated 
with the word, then the volatility should be normalized by a function 
of its frequency. 

In the next section, we will try to uncover the structure in the vari¬ 
ation of document word frequencies using a factor model, which is a 
variant of the LSA model. 


4. A factor model for the vocabulary size variation 

In a factor model, expected word frequencies are allowed to change 
from text to text, as in the general random effects model. However, it 
is postulated that these changes can be explained by a relatively small 
number of factors. This approach is especially convenient for very large 
collections of data, when we are interested in reducing the complexity 
of the data, or, in other words, in “reducing the dimensionality” of an 
observed phenomenon. 
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Let the empirical frequency distribution of word types in a book b 
is denoted xj,. If the number of word types is N, then Xi is an N- 
vector. Each entry (xb) w is the frequency of word type w in book b. In 
particular, )|.r ft ||, = 1. 

The simplest factor model, which is a variant of the LSA model, 
assumes that Xb has a part which can be explained by a small number 
of factors and a part which is an unexplained noise. Hence the model 

is 

S 

X = J2 d *fkvt + Z, ( 11 ) 

k =1 

where X is an N-by-B matrix whose columns are x&, the empirical 
frequency distributions of word types, and where Z is a noise matrix^ 
We assume that {/&} is an orthonormal system of N- vectors, and {vk} 
is an orthonormal system of H-vectors. 

Every book b can be characterized by vector uib = ((i'i)b, • • •, (f s )&), 
and books with the same vector c o are expected to have the same word 
frequency distribution up to noise. 

The simplest method is to estimate 9^, / fc , and Vk is by computing 
the SVD (“Singular Value Decomposition”) of the matrix X and to use 
only that part of the decomposition that corresponds to large singular 
values. 

There are several benefits of this model. First, it has a straight¬ 
forward interpretation: the frequency matrix is approximated by a 
small-rank matrix. Hence, we fit a parsimonious model to the data 
and have a clear trade-off between the quality of the approximation 
and the complexity of the model. Second, the model can be estimated 
with efficient and fast SVD algorithms. Finally, the statistical litera¬ 
ture about factor models is rich and may provide some guidance about 
the choice of the number of factors. 

There are also significant deficiencies. The most important is that 
the model ignores the fact that Xb are the empirical frequency distri¬ 
butions. This is especially troublesome for less-frequent words, when 
most of the entries in .xy are zeros. 


3 The difference from the original LSA model is that here the decomposition is 
applied to the frequency matrix X rather than to the matrix of word counts in each 
document. In addition, the more recent implementations of the LSA method usually 
use “tf-idf” (term frequency, inverse document frequency) instead of raw counts. 
This correction often improves performance of the LSA in document indexing tasks. 
We will not use this modification in our version since it essentially removes the 
frequent words (like “the” and “in”) from consideration, and these words were found 
important in other tasks such as authorship attribution. 
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The second deficiency is that most of the results about the number 
of factors are derived under the assumption that Z has i.i.d Gaussian 
entries. In our situation, this assumption does not hold. 

From the computational prospective, matrix X is very large (of or¬ 
der 10 4 by 10 6 ), and it is computationally difficult to estimate its spec¬ 
tral parameters. There are two alternative approaches to handle this 
difficulty. 

First, one can take a sample of texts and analyze the spectral data 
using this sample. Second, the text-word matrix can be restricted to 
the part that contain only the most frequent word types. 

In this paper, we choose the second method that uses the most 
frequent words. 

In particular, we computed eigenvalues 9k and eigenvectors fk for 
500 most frequent words. For the applications, one also need to know 
Vk, the eigenvectors of a large B-by-B matrix X*X. Fortunately, they 
can be easily computed: 

Vk = ( 12 ) 


The four largest eigenvalues were found equal to : 91.6, 3.15, 1.76, 
and 1.52. The first eigenvalue is much larger than the other ones and 
corresponds to an eigenvector with positive entries. This eigenvector 
can be interpreted as the average frequency distribution and all other 
eigenvectors as “corrections”. 

In order to estimate the number of factors, we note some stylized 
facts from the theory of large random matrices (Baik and Silverstein 
2006, Paul 2007, Benaych-Georges and Nadakuditi 2012). If a large 


random matrix Z deformed by a low-rank matrix, then the resulting 
matrix X has the “bulk” spectrum that correspond to singular values of 
Z and outlier singular values which correspond to the singular values 
of the low-rank perturbations. 

The plots for the eigenvalues and their spacings suggest that there 
are at least 10 outliers that can be interpreted as detectable factors. 
The plot of eigenvectors fk suggest that the eigevectors are concen¬ 
trated on less than 100 of the most frequent words. 


5. pLSA and LDA models 

In the pLSA (“probabilistic latent semantic analysis”) approach, the 
true word frequencies in a document are modeled as a mixture of a few 
probability distributions, which are interpreted as word distributions 
belonging to a factor (or a “topic” in the terminology of text indexing 
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Figure 5. Distribution of eigenvalues of XX*. (The largest 
eigenvalue is excluded.) 



Figure 6. The eigenvectors /*. for the first three largest eigen¬ 
values of XX*. 


literature). 


P(w\b) = ^P(w\z)P(z\b) 


(13) 


Z= 1 


The interpretation is that for each word in a book b we randomly select 
a topic z and then select the probability of a word w on the basis of 
this topic. In other words, given topic z, the probability of a word w 
is independent of the book b. The model resembles the factor model 


(11). However, its strong advantage is that this model treats the word 


frequencies as a probability distribution in a true probability model. 

Assuming further the independence of word frequencies in a docu¬ 
ment, the model can be estimated by the log-likelihood maximization 
with the following log-likelihood function: 


jC = n W;b log P(w\b), 


b,w 


(14) 
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where n& )U , is the number of occurences of the word w in a book b. The 
maximization can be performed by the EM method, although as usual, 
there is a problem of local maxima. In addition, in our experience the 
convergence rate was rather slowj^] 

The LDA (“latent Dirichlet allocation”) model is similar to the pLSA 
in that it is assumed that the distribution of words in a text is controlled 
by an s-by-iV matrix 8 which is a matrix of conditional probability of 
a word given a topic, /3 ZW = P(w\z). Every document is associated 
with a probability distribution over topics 9 b which is an s-vector of 
conditional probabilities (9 b ) z = P(z\b). The novel idea is to treat the 
vector 9 b as a random variable drawn from a Dirichlet distribution with 
an s- vector parameter a. 

The idea to treat conditional probabilities as random variables is 
the key idea of the hierarchical Bayesian modeling. In this particular 
context, its main intention is to use the information about the distri¬ 
bution of 9 b over all texts b in order to make more precise estimates of 
a particular 9 b . 

To restate, the joint distribution of the mixture 6, and sequences of 
words {wy} and topics {zi} in a text b is 

N b 

P(9,{wi},{zi}\a,/3) = P(9\a)^2 P(wi\zi, /3)P(zi\9), (15) 

i— 1 


where P(9\a) is the Dirichlet distribution with parameter a. 

The main task is to estimate the parameters a and /3 and compute 
the posterior distribution P(9\{wi}). This is a non-trivial computa¬ 
tional problem. Several approximation algorithms are available. For 
details, see paper by Blei, Ng, and Jordan 2003( In our experiments we 


used the code developed in Verbeek 2006 


The evaluations of practical benefits of LDA over pLSA differ. While 
Blei, Ng, and Jordan 2003 found some benefits of the LDA over pLSA 
in the context of collaborative filtering, Masada, Kiyasu, and Miyahara 


2008 found no advantage of LDA over pLSA in classification of Japanese 


and Korean webpages. 

The advantage of the LDA model for our purposes is that it can be 
used to investigate the burstiness phenomenon. (For a related model, 


4 In the model with 10 factors, 100 frequent words and approximately 27,000 
books, the convergence from a random starting guess to the 6th digit took several 
minutes and to the 9th digit took several hours, with some evidence that each new 
digit of precision takes progressively more time. The code was implemented in 


Matlab on a PC machine. We have also used the pLSA code from Verbeek 2006 
for comparison. It yielded similar results. 
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the Dirichlet compound multinomial model, the burstiness was inves¬ 


tigated in Madsen, Kauchak, and Elkan 2005 ) 

In particular, we will use the LDA model to clarify results found in 
Section [3] First, note that the probability of word w in a book b equals 
w)bz = e:= i dbzfizw Here 6i is a realization of a random vector 6 
distributed according to the Dirichlet distribution with parameter a. 
The joint moments of the Dirichlet distribution are well-known: 


IK 

\z=i 


r(E,ai 


r + h)) 


X 


n 


r (aj + kj) 

V («i) 


and therefore one can easily compute the moments of the linear com¬ 
binations of 0 l . 

Consider, for simplicity, the case with only two factors and the sym¬ 
metric Dirichlet distribution. So, let s = 2 and ay = «2 = ol. Then 
the probability that a particular word in a book is a word w has a 
distribution with the expectation: 

E(Puj) = 2^ lw + 

and the variance can be computed as 

V(p«,) = 7E —-riPiw ~ M 2 - 


4 2a + 1 

If £w — \P 21 u — /5iru|/2, then we could recover the Endings in Section 
[^provided that £ w ~ (E p w ) K ^ 2 with k = 1.25. The problem with this 
interpretation, is that this relation is impossible for small E p w . Indeed, 
the positivity of f3\ w and /3 2w implies that £ w < Ep^ and this contradicts 
the previous relation for small E p w . This can also be seen from the fact 
that Y(p w ) < (E p w ) 2 in this model. 

This can be rectified by using an asymmetric model. Take for ex¬ 
ample s = 2, ay = 1 and 02 = a. In this case, 

1 (y. 

i i —filw “1“ | — $2w 1 


1 “h Q! 


1 Ol 


V(p w ) = 


a 


(2 + a)(l + a ) 2 
Let a <$: 1, /3 lw = 7 w a <C /3 2w . Then, 


(ftlw /^2io) 


[E(p lu )] / ~ ( 7 W + f3 2w ) 2 a 2 


V(pJ ~ ^fa. 


and 
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Hence, V(p w ) 3> [E(p „,)] 2 provided that 7 ^ is not too large relative 
to 0 2w . 

Intuitively, the second topic occurs very rarely (a -C 1). However, 
it is associated with much larger conditional probability to observe 
the word w: 0 2w 0iw This leads to a relatively large variance of 
the frequency distribution for the word w. In other words, the high 
burstiness of the word w is due to its being a marker of a rare topic. 

Next, we observe that when a is small and fixed, the power relation 
V(p w ) — [E ^^,)] 1 ' 25 is possible but only if 7 ^ 0 2w . Since 7 ^ = 

0i w /a, it follows that the relationship can occur in a limited range 
when 0 lw <C /3 2w 0i w /oi. This range is wide only if a is small 



Estimated p (topic-to-word probability) 



0 100 200 300 400 500 600 700 800 900 1000 


Figure 7. The estimated parameters of the LDA model with 
50 topics and 1000 most frequent words. 


In summary, the power relation observed in Section 3 appears to be 
due to the asymmetry in the distribution of topics vector 9, and, in 
particular, it is due to the existence of rare topics that are associated 
with some specific words (“topic markers”). 

In order to demonstrate the asymmetry in the distribution of topics 
in the data, we show the estimates of the parameter a which is the 
Dirichlet parameter for topics, and the parameter 0 Z = p(w\z) for one 
of the rare topics z. 

The left plot in Figure [7] shows the distribution of a, which ranges 
from 0.04 to 0.45. The right plot shows that a rare topic is indeed 
associated with marker words. In this example, for the topic with 
a = 0.04, there are three relatively infrequent words with 0 > 0.03. 
They are “Bee” (“all”), “eru;e” (“yet”), and “ee” (“her”). Their average 
frequencies are 3.8 x 10 -4 , 2 x 10~ 4 , and 1.9 x 10“ 4 , respectively. The 
common feature of these words is the presence of the letter “e”. This 
















14 


VLADISLAV KARGIN 


letter is often substituted by the letter “e” to economize on typography 
costs, and its presence indicates that either the book is intended for 
children or it has been published recently with the help of computerized 
typography. 


6. Conclusion 

In this paper we studied the variation in the vocabulary of Russian 
literary texts from a large online database. 

First, we detected a significant variation in the distribution of word 
frequencies across texts, and found that the variance of this distribution 
is in general larger for words with higher frequency. We found that the 
dependence of the word frequency volatility on its mean has a form 
of power law with the exponent 0.625, which quantify the observation 
that rarer words has greater degree of “burstiness”. 

In order to study the variation in word frequencies across texts, we 
applied several variants of the factor analysis method. We found that 
most of the variation is concentrated in approximately 100 functional 
words and a significant portion of this variation can be explained by 
about 10 factors. An analysis of the LDA model suggests that the power 
dependence of the frequency volatility on its mean can be explained by 
an asymmetry in the prior distribution of topics. 
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Appendix A. Tables 


Table 8. 

Authors with largest number ot texts 




All authors 

Authors of classic 

prose 

Author 

Unknown Author 

N. of texts 
2442 

Comment 

Author N. 

Chekhov 

of texts 
446 

«Tomorrow» 

597 

A weekly publication 

Maupassant 

390 

Bradbury 

550 


Gorky 

379 

Bulychev 

540 

Russian sci-fi writer 

Tolstoi 

311 

Asimov 

508 


Grin 

295 

Marina Serova 

464 

A group of mystery fic¬ 
tion writers 

P. Neruda 

245 

Anton Chekhov 

446 


E. A. Poe 

237 

« CompuTerra» 

437 

A weekly publication 

Nabokov 

227 

Agatha Christie 

433 


Borges 

224 

Stephen King 

392 


0. Henry 

217 

Gny de Maupassant 

390 


Leskov 

215 

Maxim Gorky 

379 


Mark Twain 

212 

Arthur Conan Doyle 

378 


Dumas 

185 

Victor Weber 

371 

Translator 

Kuprin 

183 

Barbara Cartland 

356 


Kipling 

179 

Robert Sheckley 

356 


Bunin 

171 

Irina Gurova 

353 

Translator 

Bulgakov 

165 

Fedor Razzakov 

348 

A biographer of Russian 
media stars. 

Solzhenitsyn 

164 

Stanislaw Lem 

341 


L. Andreev 

153 

Leo Tolstoi 

311 


Petrushevskaya 

152 

Alexander Grin 

295 

Romantic novels set in a 
fantasy land 

Pushkin 

151 

Vladimir Goldich 

291 

Translator 

Balzak 

150 

Roger Zelazny 

291 


Hasek 

147 

Robert E. Howard 

286 


Shukshin 

146 

Tatiana Pertseva 

275 

Translator 

Kafka 

144 



